-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#20970] Fix gRPC leak by closing ResidualSource at BoundedToUnboundedSourceAdapter.Reader#init() in Dataflow worker #28548
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
01b3774
to
204c697
Compare
I put a comment |
retest this please. |
Run Java_GCP_IO_Direct PreCommit |
R: @kennknowles |
R: @Abacn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
@@ -288,6 +288,15 @@ private void init( | |||
residualElementsList == null | |||
? new ResidualElements(Collections.emptyList()) | |||
: new ResidualElements(residualElementsList); | |||
|
|||
if (this.residualSource != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking - this means that we are calling init()
again without ever calling close()
on the reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's at Reader.getCheckpointMark().
The getCheckpointMark()
is called at bundle finish in Dataflow streaming jobs, and close()
will be pending as readers are cached. A reader can be reused in next bundles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea based on that the change is obviously correct. Thanks!
I can see that testing a fix for a memory leak may be challenging, but can we do anything to test this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to make a test to confirm the effect of this change :) I'll ping when it's ready.
@@ -288,6 +288,15 @@ private void init( | |||
residualElementsList == null | |||
? new ResidualElements(Collections.emptyList()) | |||
: new ResidualElements(residualElementsList); | |||
|
|||
if (this.residualSource != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's at Reader.getCheckpointMark().
The getCheckpointMark()
is called at bundle finish in Dataflow streaming jobs, and close()
will be pending as readers are cached. A reader can be reused in next bundles.
@@ -288,6 +288,15 @@ private void init( | |||
residualElementsList == null | |||
? new ResidualElements(Collections.emptyList()) | |||
: new ResidualElements(residualElementsList); | |||
|
|||
if (this.residualSource != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea based on that the change is obviously correct. Thanks!
@@ -505,6 +514,7 @@ BoundedSource<T> getSource() { | |||
} | |||
|
|||
Checkpoint<T> getCheckpointMark() { | |||
checkArgument(!closed, "getCheckpointMark() call on closed %s", getClass().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checkState
since there are no arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -776,7 +776,7 @@ public double getRemainingParallelism() { | |||
|
|||
private static class UnboundedReaderIterator<T> | |||
extends NativeReader.NativeReaderIterator<WindowedValue<ValueWithRecordId<T>>> { | |||
private final UnboundedSource.UnboundedReader<T> reader; | |||
private final UnboundedSource.UnboundedReader<T> reader; // not owned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does ownership mean here? Not to do with memory management obviously. If I understand correctly, it means that there may be other calls to various state methods so you can never assume what state the reader will be in? (it would be worth writing the whole long comment about what this means here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added a comment to explain it in detail, rather than a short ambiguous comment.
@kennknowles I added a unit test |
I feel like I want to run a bit of a variety of integration tests but I'm not 100% sure which ones. |
Run Java_IOs_Direct PreCommit |
OK the failure we see has been fixed at HEAD. Can you rebase against the master branch and we can re-run to get green. |
Run Java_GCP_IO_Direct PreCommit |
Run Java PreCommit |
…ReaderIterator.reader
…rceAdapter.Reader
Run Java_GCP_IO_Direct PreCommit |
Run Java PreCommit |
Run Java_GCP_IO_Direct PreCommit |
@kennknowles Some tests were flaky. But, finally, it passed all the checks. |
This is another case of #20970 (harmless error logs with gRPC leak).
BigQueryIO.read()
withMethod.DIRECT_READ
in Dataflow streaming jobs can cause Dataflow to output the gRPC leak error logs like below.This is because
getCheckpointMark()
inBoundedToUnboundedSourceAdapter
callsinit()
whereresidualSource
field is updated, but it missed closing oldresidualSource
, which caused a leak ofResidualSource#reader
closing. Thereader
in the stacktrace below isBigQueryStorageStreamReader
. Not closingBigQueryStorageStreamReader
causes the gRPC leak error logs.Error mesasge:
Stacktrace: